Outline

Intro Foul Ball Leaders Batters Pitchers Analysis 0/1 Strike vs. 2 Strike 0/1 Strike vs. K% 2 Strike vs. K% Team Foul % Scatterplot See if anyone like Acuna’s K% dropped/rose due to foul ball change Hardest hit foul balls? Dive into any players with interesting numbers like Jazz, Yordan, etc.

# Setup
library(baseballr)
library(DBI)
library(bigrquery)
library(dplyr)
library(scales)
library(plotly)
library(knitr)
library(kableExtra)

# BigQuery
bq <- dbConnect(bigrquery::bigquery(),
                project = "pjb-sports-data",
                dataset = "mlb")

# Load Statcast Data
statcast_leaderboard <- list()
for (player_type in c("batter", "pitcher")) {
  statcast_leaderboard[[player_type]] <- statcast_leaderboards(
    leaderboard = "expected_statistics", year = 2023, min_pa = 1,
    player_type = player_type
  )
}

Intro

We talk often about player tendencies to achieve certain outcomes… a hitter’s spray chart, a hitter’s launch angle breakdown, a pitcher’s pitch distribution, a pitcher’s fly ball to ground ball ratio, etc. However, there is a common event that often gets overlooked in baseball analysis: the foul ball. Every now and then, a stat about foul balls will emerge, such as how https://www.si.com/more-sports/2012/06/05/joey-vottoreds, but more attention is lent to the balls that are put in play, watched or whiffed at.

ggplot(
  dbGetQuery(bq, 'SELECT
                    description,
                    count / total `%`
                  FROM
                    (
                      SELECT
                        1 foo,
                        description,
                        COUNT(*) count
                      FROM
                        (
                          SELECT
                            REGEXP_REPLACE(description, "_*blocked_*", "") description,
                          FROM
                            `mlb.statcast_pitches`
                          WHERE
                            game_year = 2023 AND game_type = "R"
                        )
                      GROUP BY
                        description
                    )
                  JOIN
                    (
                      SELECT
                        1 foo,
                        COUNT(*) total
                      FROM
                        `mlb.statcast_pitches`
                      WHERE
                        game_year = 2023 AND game_type = "R"
                    )
                  USING
                    (foo);'),
  aes(reorder(description, -`%`), `%`)
) +
  geom_bar(stat = "identity", fill = "#0099f9") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "2023 Pitch Results", x = "", y = "Frequency") +
  scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
  geom_text(aes(label = percent(`%`, accuracy = 0.1)), vjust = -0.5, size = 3)

Foul balls make up nearly 18% of pitch outcomes and can be interpreted in a number of different ways. With less than two strikes… - the pitch wasn’t really what the batter was looking for, but he swung anyway. - the batter’s swing or timing was slightly off. With two strikes… - the batter is just trying to stay alive with two strikes. - the pitch wasn’t located well enough or wasn’t deceptive enough to miss the bat entirely.

Fouls can feel like a neutral outcome when they happen on the field, but they are almost always a positive outcome for either the pitcher or the hitter. For example, with less than two strikes, a foul ball is a positive outcome for the pitcher. Certain foul balls, like a towering drive that hooks foul, can indicate that the batter has the upper hand, but it still counts like any other strike, and the pitcher can breath a sigh of relief. When there are two strikes, a foul ball is always going to feel like the batter held strong, and the pitcher is annoyed that he has to throw another pitch. In short, a foul ball is a win for the pitcher with less than 2 strikes, and it is a win for the hitter with 2 strikes. With this in mind, I wanted to split hitter/pitcher foul ball tendencies based on the count.

batters.2023.df <- statcast_leaderboard$batter %>%
  filter(pa >= 200) %>%
  mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
  select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
         -est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
  rename(PA = pa) %>%
  # All pitches
  inner_join(
    dbGetQuery(bq, 'SELECT
                      batter player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip",
                        "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes < 2 AND
                      /* Swing */
                      description IN ("foul", "hit_into_play", "swinging_strike",
                        "foul_tip", "swinging_strike_blocked")
                    GROUP BY
                      batter;'),
    by = "player_id"
  ) %>%
  # 2-Strikes
  inner_join(
    dbGetQuery(bq, 'SELECT
                      batter player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip",
                        "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes = 2 AND
                      /* Swing */
                      description IN ("foul", "hit_into_play", "swinging_strike",
                        "foul_tip", "swinging_strike_blocked")
                    GROUP BY
                      batter;'),
    by = "player_id"
  )

pitchers.2023.df <- statcast_leaderboard$pitcher %>%
  filter(pa >= 200) %>%
  mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
  select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
         -est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
  rename(TBF = pa) %>%
  # All pitches
  inner_join(
    dbGetQuery(bq, 'SELECT
                      pitcher player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip",
                        "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes < 2 AND
                      /* Swing */
                      description IN ("foul", "hit_into_play", "swinging_strike",
                        "foul_tip", "swinging_strike_blocked")
                    GROUP BY
                      pitcher;'),
    by = "player_id"
  ) %>%
  # 2-Strikes
  inner_join(
    dbGetQuery(bq, 'SELECT
                      pitcher player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip",
                        "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes = 2 AND
                      /* Swing */
                      description IN ("foul", "hit_into_play", "swinging_strike",
                        "foul_tip", "swinging_strike_blocked")
                    GROUP BY
                      pitcher;'),
    by = "player_id"
  )

2023 Batters (min. 200 PAs)

0/1-Strike Foul %

2-Strike Foul %

10 Highest

kable(
  batters.2023.df %>%
    select(Name, PA, `0|1-Strike Foul %`) %>%
    arrange(desc(`0|1-Strike Foul %`)) %>%
    mutate(
      `0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 0|1-Strike Foul %
Jake Cronenworth 522 48.78%
Isaac Paredes 571 48.43%
Bo Naylor 230 46.54%
Ozzie Albies 660 45.89%
Nathaniel Lowe 724 45.59%
Gio Urshela 228 45.05%
Zach McKinstry 518 45.03%
Max Kepler 491 44.52%
Adley Rutschman 687 44.26%
Pavin Smith 228 44.20%
kable(
  batters.2023.df %>%
    select(Name, PA, `2-Strike Foul %`) %>%
    arrange(desc(`2-Strike Foul %`)) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 2-Strike Foul %
Yordan Alvarez 496 46.95%
Sal Frelick 223 46.75%
Geraldo Perdomo 495 46.02%
Justin Turner 626 45.81%
Ty France 665 45.59%
Isiah Kiner-Falefa 361 45.55%
Cody Bellinger 556 45.43%
Daulton Varsho 581 45.11%
Will Smith 554 44.87%
Alec Burleson 347 44.87%

10 Lowest

kable(
  batters.2023.df %>%
    select(Name, PA, `0|1-Strike Foul %`) %>%
    arrange(`0|1-Strike Foul %`) %>%
    mutate(
      `0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 0|1-Strike Foul %
Christian Bethancourt 332 29.48%
William Contreras 611 29.57%
Eloy Jiménez 489 29.92%
Javier Báez 547 30.26%
Kevin Kiermaier 408 30.29%
Aaron Judge 458 30.75%
Jose Siri 364 30.82%
Joey Wiemer 410 30.87%
Jordan Walker 465 30.91%
Luke Raley 406 31.07%
kable(
  batters.2023.df %>%
    select(Name, PA, `2-Strike Foul %`) %>%
    arrange(`2-Strike Foul %`) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 2-Strike Foul %
Jazz Chisholm Jr.  383 23.53%
Mark Vientos 233 25.32%
Jose Siri 364 27.57%
Eloy Jiménez 489 27.94%
Harrison Bader 344 28.64%
Brett Baty 389 28.69%
Francisco Alvarez 423 28.87%
Christopher Morel 429 29.08%
Mickey Moniak 323 29.51%
Paul DeJong 400 29.64%
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
batter.scatter.plot.df <- batters.2023.df %>%
  mutate(diff = `2-Strike Foul %` - `0|1-Strike Foul %`,
         color = as.factor(
           ifelse(diff >= 0.06, "#025189",
                  ifelse(diff >= 0.03, "#0c9cb4",
                         ifelse(diff >= 0, "#94c280",
                                ifelse(diff >= -0.03, "#f1c359",
                                       ifelse(diff >= -0.06, "#d03f2e",
                                              "#982123"))))))) %>%
  select(Name, `2-Strike Foul %`, `0|1-Strike Foul %`, color)

ggplotly(
  ggplot(
    batter.scatter.plot.df,
    aes(x = `0|1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(Name, "\n0/1-Strike Foul %: ",
                     percent(`0|1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_point(aes(color = color)) +
    scale_colour_manual(values = levels(batter.scatter.plot.df$color)) +
    labs(title = "2023 Batters (min. 200 PAs)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none"),
  tooltip = "text") %>%
  layout(annotations = list(
    list(text = "Hover over any point for player details", x = 0.437, y = 0.462,
         font = list(size = 10)),
    list(text = "y = x", x = 0.484, y = 0.476, font = list(size = 10),
         showarrow = FALSE))) %>%
  config(displayModeBar = FALSE)

There is a positive correlation between 0/1-Strike Foul % and 2-Strike Foul % (correlation coefficient of 0.42), but substantial variance exists, too. Take Yordan Alvarez, for example. With 0 or 1 strike, he fouls off only 34.1% of pitches, but that number increases to 46.9% with 2 strikes, which is the highest in all of MLB.

ggplotly(
  ggplot(
    dbGetQuery(bq, 'WITH
                      team_foul_pct
                    AS
                      (
                        SELECT
                          CASE inning_topbot WHEN "top" THEN away_team ELSE home_team END team,
                          strikes,
                          COUNTIF(description = "foul") fouls,
                          COUNT(*) pitches,
                        FROM
                          `mlb.statcast_pitches`
                        WHERE
                          game_year = 2023 AND game_type = "R" AND
                          description IN ("foul", "hit_into_play", "swinging_strike",
                            "foul_tip", "swinging_strike_blocked") /* Swing */
                        GROUP BY
                          team,
                          strikes
                      )
                    
                    SELECT
                      team,
                      `0|1-Strike Foul %`,
                      `2-Strike Foul %`
                    FROM
                      (
                        SELECT
                          team,
                          SUM(fouls) / SUM(pitches) `0|1-Strike Foul %`
                        FROM
                          team_foul_pct
                        WHERE
                          strikes < 2
                        GROUP BY
                          team
                      )
                    JOIN
                      (
                        SELECT
                          team,
                          SUM(fouls) / SUM(pitches) `2-Strike Foul %`
                        FROM
                          team_foul_pct
                        WHERE
                          strikes = 2
                        GROUP BY
                          team
                      )
                    USING
                      (team);'),
    aes(x = `0|1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(team, "\n0/1-Strike Foul %: ",
                     percent(`0|1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_point(col = "#69b3a2") +
    labs(title = "2023 Teams (Batting)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none"),
  tooltip = "text") %>%
  layout(annotations = list(
    list(text = "Hover over any point for team details", x = 0.377, y = 0.3965,
         font = list(size = 10)),
    list(text = "y = x", x = 0.388, y = 0.39, font = list(size = 10),
         showarrow = FALSE))) %>%
  config(displayModeBar = FALSE)

2023 Pitchers (min. 200 batters faced)

0/1-Strike Foul %

2-Strike Foul %

10 Highest

kable(
  pitchers.2023.df %>%
    select(Name, TBF, `0|1-Strike Foul %`) %>%
    arrange(desc(`0|1-Strike Foul %`)) %>%
    mutate(
      `0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 0|1-Strike Foul %
Brusdar Graterol 257 45.57%
Chris Murphy 212 45.31%
Luis Severino 417 44.93%
Johnny Cueto 218 44.36%
Nestor Cortes 266 44.34%
Steven Wilson 219 44.21%
Joe Ryan 672 44.04%
Brad Hand 236 44.00%
Louie Varland 283 43.34%
Cody Bradford 234 43.08%
kable(
  pitchers.2023.df %>%
    select(Name, TBF, `2-Strike Foul %`) %>%
    arrange(desc(`2-Strike Foul %`)) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 2-Strike Foul %
Ron Marinaccio 205 51.41%
Josh Hader 231 47.62%
Sean Manaea 499 46.85%
Drew Smith 244 46.73%
Reynaldo López 278 46.33%
Brock Burke 250 46.07%
Johnny Cueto 218 45.71%
Kyle Muller 372 45.66%
Jhony Brito 372 45.59%
Jake Irvin 530 45.57%

10 Lowest

kable(
  pitchers.2023.df %>%
    select(Name, TBF, `0|1-Strike Foul %`) %>%
    arrange(`0|1-Strike Foul %`) %>%
    mutate(
      `0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 0|1-Strike Foul %
Robert Stephenson 201 25.29%
Elvis Peguero 252 25.82%
Andrés Muñoz 211 27.80%
Giovanny Gallegos 229 28.28%
Alex Lange 288 28.71%
Josh Sborz 215 28.92%
Gregory Santos 289 28.96%
Bryan Abreu 287 29.15%
Josh Fleming 221 29.39%
Alex Young 236 29.79%
kable(
  pitchers.2023.df %>%
    select(Name, TBF, `2-Strike Foul %`) %>%
    arrange(`2-Strike Foul %`) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 2-Strike Foul %
Alex Lange 288 25.00%
Yency Almonte 207 25.86%
Phil Maton 274 26.63%
Mark Leiter Jr.  269 26.83%
Alexis Díaz 286 27.66%
Quinn Priester 234 28.00%
Jordan Romano 248 28.12%
Gregory Soto 250 28.26%
Albert Abreu 268 28.66%
Drew VerHagen 268 28.82%
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
pitcher.scatter.plot.df <- pitchers.2023.df %>%
  mutate(diff = `2-Strike Foul %` - `0|1-Strike Foul %`,
         color = as.factor(
           ifelse(diff >= 0.06, "#982123",
                  ifelse(diff >= 0.03, "#d03f2e",
                         ifelse(diff >= 0, "#f1c359",
                                ifelse(diff >= -0.03, "#94c280",
                                       ifelse(diff >= -0.06, "#0c9cb4",
                                              "#025189"))))))) %>%
  select(Name, `2-Strike Foul %`, `0|1-Strike Foul %`, color)

ggplotly(
  ggplot(
    pitcher.scatter.plot.df,
    aes(x = `0|1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(Name, "\n0/1-Strike Foul %: ",
                     percent(`0|1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_point(aes(color = color)) +
    scale_colour_manual(values = levels(pitcher.scatter.plot.df$color)) +
    labs(title = "2023 Pitchers (min. 200 batters faced)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none"),
  tooltip = "text") %>%
  layout(annotations = list(
    list(text = "Hover over any point for player details", x = 0.316, y = 0.444,
         font = list(size = 10)),
    list(text = "y = x", x = 0.461, y = 0.47, font = list(size = 10),
         showarrow = FALSE))) %>%
  config(displayModeBar = FALSE)

Do foul ball tendencies relate to K%

Hard hit foul balls?

Since Statcast still tracks the exit velocity of foul balls, I wanted to examine not just the frequency with which players hit foul balls, but also how hard they are hitting them (Avg. EV and Hard Hit %)

Players of Interest

dbDisconnect(bq)